% LaTeX source for textbook ``How to think like a computer scientist''
% Copyright (c)  2001  Allen B. Downey, Jeffrey Elkner, and Chris Meyers.

% Permission is granted to copy, distribute and/or modify this
% document under the terms of the GNU Free Documentation License,
% Version 1.1  or any later version published by the Free Software
% Foundation; with the Invariant Sections being "Contributor List",
% with no Front-Cover Texts, and with no Back-Cover Texts. A copy of
% the license is included in the section entitled "GNU Free
% Documentation License".

% This distribution includes a file named fdl.tex that contains the text
% of the GNU Free Documentation License.  If it is missing, you can obtain
% it from www.gnu.org or by writing to the Free Software Foundation,
% Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.

\chapter{Dictionaries}
\index{dictionary}

\index{dictionary}
\index{data type!dictionary}
\index{type!dict}
\index{key}
\index{key-value pair}
\index{index}

The compound types you have learned about---strings, lists, and
tuples---use integers as indices.  If you try to use any other type as
an index, you get an error.

{\bf Dictionaries} are similar to other compound types except that
they can use any immutable type as an index.  As an example, we will
create a dictionary to translate English words into Spanish.  For this
dictionary, the indices are {\tt strings}.

One way to create a dictionary is to start with the empty dictionary
and add elements.  The empty dictionary is denoted {\verb+{}+}:

\beforeverb
\begin{verbatim}
>>> eng2sp = {}
>>> eng2sp['one'] = 'uno'
>>> eng2sp['two'] = 'dos'
\end{verbatim}
\afterverb
%
The first assignment creates a dictionary named {\tt eng2sp}; the
other assignments add new elements to the dictionary.  We can print
the current value of the dictionary in the usual way:

\beforeverb
\begin{verbatim}
>>> print eng2sp
{'one': 'uno', 'two': 'dos'}
\end{verbatim}
\afterverb
%
The elements of a dictionary appear in a comma-separated list.  Each
entry contains an index and a value separated by a colon.  In a
dictionary, the indices are called {\bf keys}, so the elements are
called {\bf key-value pairs}.

Another way to create a dictionary is to provide a list of key-value
pairs using the same syntax as the previous output:

\beforeverb
\begin{verbatim}
>>> eng2sp = {'one': 'uno', 'two': 'dos', 'three': 'tres'}
\end{verbatim}
\afterverb
%
If we print the value of {\tt eng2sp} again, we get a surprise:

\beforeverb
\begin{verbatim}
>>> print eng2sp
{'one': 'uno', 'three': 'tres', 'two': 'dos'}
\end{verbatim}
\afterverb
%
The key-value pairs are not in order!  Fortunately, there is no reason
to care about the order, since the elements of a dictionary are never
indexed with integer indices.  Instead, we use the keys to look up the
corresponding values:

\beforeverb
\begin{verbatim}
>>> print eng2sp['two']
'dos'
\end{verbatim}
\afterverb
%
The key {\tt 'two'} yields the value {\tt 'dos'} even though it
appears in the third key-value pair.


\section{Dictionary operations}
\index{dictionary!operation}
\index{operation!dictionary}

The {\tt del} statement removes a key-value pair from a dictionary.
For example, the following dictionary contains the names of various
fruits and the number of each fruit in stock:

\beforeverb
\begin{verbatim}
>>> inventory = {'apples': 430, 'bananas': 312, 'oranges': 525, 
'pears': 217}
>>> print inventory
{'oranges': 525, 'apples': 430, 'pears': 217, 'bananas': 312}
\end{verbatim}
\afterverb
%
If someone buys all of the pears, we can remove the entry from the
dictionary:

\beforeverb
\begin{verbatim}
>>> del inventory['pears']
>>> print inventory
{'oranges': 525, 'apples': 430, 'bananas': 312}
\end{verbatim}
\afterverb
%
Or if we're expecting more pears soon, we might just change the
value associated with pears:

\beforeverb
\begin{verbatim}
>>> inventory['pears'] = 0
>>> print inventory
{'oranges': 525, 'apples': 430, 'pears': 0, 'bananas': 312}
\end{verbatim}
\afterverb
%
The {\tt len} function also works on dictionaries; it returns the
number of key-value pairs:

\beforeverb
\begin{verbatim}
>>> len(inventory)
4
\end{verbatim}
\afterverb
%


\section{Dictionary methods}
\index{dictionary!method}
\index{method!dictionary}
\index{method}
\index{method!invocation}
\index{invoking method}

A {\bf method} is similar to a function---it takes arguments and
returns a value---but the syntax is different.  For example, the {\tt
keys} method takes a dictionary and returns a list of the keys that
appear, but instead of the function syntax {\tt keys(eng2sp)}, we use
the method syntax {\tt eng2sp.keys()}.

\index{dot notation}

\beforeverb
\begin{verbatim}
>>> eng2sp.keys()
['one', 'three', 'two']
\end{verbatim}
\afterverb
%
This form of dot notation specifies the name of the function, {\tt
keys}, and the name of the object to apply the function to, {\tt
eng2sp}.  The parentheses indicate that this method has no
parameters.

A method call is called an {\bf invocation}; in this case, we would
say that we are invoking {\tt keys} on the object {\tt eng2sp}.

The {\tt values} method is similar; it returns a list of the values in
the dictionary:

\beforeverb
\begin{verbatim}
>>> eng2sp.values()
['uno', 'tres', 'dos']
\end{verbatim}
\afterverb
%
The {\tt items} method returns both, in the form of a
list of tuples---one for each key-value
pair:

\beforeverb
\begin{verbatim}
>>> eng2sp.items()
[('one','uno'), ('three', 'tres'), ('two', 'dos')]
\end{verbatim}
\afterverb
%
The syntax provides useful type information.  The square brackets
indicate that this is a list.  The parentheses indicate that
the elements of the list are tuples.

If a method takes an argument, it uses the same syntax as a function
call.  For example, the method {\tt has\_key} takes a key
and returns
true (1) if the key appears in the dictionary:

\beforeverb
\begin{verbatim}
>>> eng2sp.has_key('one')
True
>>> eng2sp.has_key('deux')
False
\end{verbatim}
\afterverb
%
If you try to call a method without specifying an object, you get an
error.  In this case, the error message is not very helpful:

\beforeverb
\begin{verbatim}
>>> has_key('one')
NameError: has_key
\end{verbatim}
\afterverb
%

\index{runtime error}


\section{Aliasing and copying}
\index{aliasing}
\index{copying}
\index{cloning}

Because dictionaries are mutable, you need to be aware of aliasing.
Whenever two variables refer to the same object, changes to one affect
the other.

If you want to modify a dictionary and keep a copy of the original,
use the {\tt copy} method.  For example, {\tt opposites} is a
dictionary that contains pairs of opposites:

\beforeverb
\begin{verbatim}
>>> opposites = {'up': 'down', 'right': 'wrong', 'true': 'false'}
>>> alias = opposites
>>> copy = opposites.copy()
\end{verbatim}
\afterverb
%
{\tt alias} and {\tt opposites} refer to the same object; {\tt copy}
refers to a fresh copy of the same dictionary.  If we modify {\tt
alias}, {\tt opposites} is also changed:

\beforeverb
\begin{verbatim}
>>> alias['right'] = 'left'
>>> opposites['right']
'left'
\end{verbatim}
\afterverb
%
If we modify {\tt copy}, {\tt opposites} is unchanged:

\beforeverb
\begin{verbatim}
>>> copy['right'] = 'privilege'
>>> opposites['right']
'left'
\end{verbatim}
\afterverb
%


\section{Sparse matrices }
\index{matrix!sparse}
\index{nested list}
\index{list!nested}

In Section~\ref{nested lists}, we used a list of lists to represent a
matrix.  That is a good choice for a matrix with mostly nonzero
values, but consider a sparse matrix like this one:

\beforefig
\centerline{\psfig{figure=illustrations/sparse.eps}}
\afterfig

The list representation contains a lot of zeroes:

\beforeverb
\begin{verbatim}
matrix = [ [0,0,0,1,0],
           [0,0,0,0,0],
           [0,2,0,0,0],
           [0,0,0,0,0],
           [0,0,0,3,0] ]
\end{verbatim}
\afterverb
%
An alternative is to use a dictionary.
For the keys, we can use tuples that contain the row and column
numbers.  Here is the dictionary representation of the same matrix:

\beforeverb
\begin{verbatim}
matrix = {(0,3): 1, (2, 1): 2, (4, 3): 3}
\end{verbatim}
\afterverb
%
We only need three key-value pairs, one for each nonzero element of the
matrix.  Each key is a tuple, and each value is an integer.

To access an element of the matrix, we could use the {\tt []}
operator:

\beforeverb
\begin{verbatim}
matrix[0,3]
1
\end{verbatim}
\afterverb
%
Notice that the syntax for the dictionary representation is not the
same as the syntax for the nested list representation.  Instead of
two integer indices, we use one index, which is a tuple of integers.

There is one problem.
If we specify an element that is zero, we get an
error, because there is no entry in the dictionary with that key:

\index{runtime error}

\beforeverb
\begin{verbatim}
>>> matrix[1,3]
KeyError: (1, 3)
\end{verbatim}
\afterverb
%
The {\tt get} method solves this problem:

\beforeverb
\begin{verbatim}
>>> matrix.get((0,3), 0)
1
\end{verbatim}
\afterverb
%
The first argument is the key; the second argument is the value
{\tt get} should return if the key is not in the dictionary:

\beforeverb
\begin{verbatim}
>>> matrix.get((1,3), 0)
0
\end{verbatim}
\afterverb
%
{\tt get} definitely improves the semantics of accessing
a sparse matrix.  Shame about the syntax.


\section{Hints}
\index{hint}
\index{Fibonacci function}

If you played around with the {\tt fibonacci} function from
Section~\ref{one more example}, you might have noticed that the bigger
the argument you provide, the longer the function takes to run.
Furthermore, the run time increases very quickly.  On one of our
machines, {\tt fibonacci(20)} finishes instantly, {\tt fibonacci(30)}
takes about a second, and {\tt fibonacci(40)} takes roughly forever.

To understand why, consider this {\bf call graph} for
{\tt fibonacci} with {\tt n=4}:

\beforefig
\centerline{\psfig{figure=illustrations/fibonacci.eps,height=2in}}
\afterfig

A call graph shows a set function frames, with lines connecting each
frame to the frames of the functions it calls.  At the top of the
graph, {\tt fibonacci} with {\tt n=4} calls {\tt fibonacci} with {\tt
n=3} and {\tt n=2}.  In turn, {\tt fibonacci} with {\tt n=3} calls
{\tt fibonacci} with {\tt n=2} and {\tt n=1}.  And so on.

\index{function frame}
\index{frame}
\index{call graph}

Count how many times {\tt fibonacci(0)} and {\tt fibonacci(1)} are
called.  This is an inefficient solution to the problem, and it gets
far worse as the argument gets bigger.

A good solution is to keep track of values that have already been
computed by storing them in a dictionary.  A previously computed value
that is stored for later use is called a {\bf hint}.  Here is
an implementation of {\tt fibonacci} using hints:

\beforeverb
\begin{verbatim}
previous = {0:1, 1:1}

def fibonacci(n):
  if previous.has_key(n):
    return previous[n]
  else:
    newValue = fibonacci(n-1) + fibonacci(n-2)
    previous[n] = newValue
    return newValue
\end{verbatim}
\afterverb
%
The dictionary named {\tt previous} keeps track of the Fibonacci
numbers we already know.  We start with only
two pairs: 0 maps to 1; and 1 maps to 1.

Whenever {\tt fibonacci} is called, it checks the dictionary to
determine if it contains the result.
If it's there, the function can return
immediately without making any more recursive calls.  If not, it has
to compute the new value.  The new value is added to the dictionary
before the function returns.

Using this version of {\tt fibonacci}, our machines can compute
{\tt fibonacci(40)} in an eyeblink.  But when we try to compute
{\tt fibonacci(50)}, we see the following:

\beforeverb
\begin{verbatim}
>>> fibonacci(50)
20365011074L
\end{verbatim}
\afterverb
%
The {\tt L} at the end of the result indicates that the answer
+(20,365,011,074) is too big to fit into a Python integer.  Python
has automatically converted the result to a long integer.

\section{Long integers}
\index{long integer}
\index{data type!long integer}
\index{type!long}
\index{integer!long}

Python provides a type called {\tt long} that can handle any size
integer.  There are two ways to create a {\tt long} value.  One is
to write an integer with a capital {\tt L} at the end:

\beforeverb
\begin{verbatim}
>>> type(1L)
<type 'long'>
\end{verbatim}
\afterverb
%
The other is to use the {\tt long} function to convert a value to a
{\tt long}.  {\tt long} can accept any numerical type and even
strings of digits:

\index{type coercion}
\index{coercion!type}

\beforeverb
\begin{verbatim}
>>> long(1)
1L
>>> long(3.9)
3L
>>> long('57')
57L
\end{verbatim}
\afterverb
%
All of the math operations work on {\tt long}s, so in general
any code that works with integers will also work with long
integers.  Any time the result of a computation is too big
to be represented with an integer, Python detects the overflow
and returns the result as a long integer.  For example:

\beforeverb
\begin{verbatim}
>>> 1000 * 1000
1000000
>>> 100000 * 100000
10000000000L
\end{verbatim}
\afterverb
%
In the first case the result has type {\tt int}; in the
second case it is {\tt long}.


\section{Counting letters}
\index{counting}
\index{histogram}
\index{compression}

In Chapter~\ref{strings}, we wrote a function that counted the number
of occurrences of a letter in a string.  A more general version of
this problem is to form a histogram of the letters in the string, that
is, how many times each letter appears.

Such a histogram might be useful for compressing a text
file.  Because different letters appear with different frequencies, we
can compress a file by using shorter codes for common letters and
longer codes for letters that appear less frequently.

Dictionaries provide an elegant way to generate a histogram:

\beforeverb
\begin{verbatim}
>>> letterCounts = {}
>>> for letter in "Mississippi":
...   letterCounts[letter] = letterCounts.get (letter, 0) + 1
...
>>> letterCounts
{'M': 1, 's': 4, 'p': 2, 'i': 4}
\end{verbatim}
\afterverb
%
We start with an empty dictionary.  For each letter in the string, we
find the current count (possibly zero) and increment it.  At the end,
the dictionary contains pairs of letters and their frequencies.

It might be more appealing to display the histogram in alphabetical
order.  We can do that with the {\tt items} and {\tt sort} methods:

\beforeverb
\begin{verbatim}
>>> letterItems = letterCounts.items()
>>> letterItems.sort()
>>> print letterItems
[('M', 1), ('i', 4), ('p', 2), ('s', 4)]
\end{verbatim}
\afterverb
%
You have seen the {\tt items} method before, but {\tt sort} is the
first method you have encountered that applies to lists.  There are
several other list methods, including {\tt append}, {\tt extend}, and
{\tt reverse}.  Consult the Python documentation for details.

\index{method!list}
\index{list method}


\section{Glossary}

\begin{description}

\item[dictionary:] A collection of key-value pairs that maps from
keys to values.
The keys can be any immutable type, and the values can be any type.

\item[key:] A value that is used to look up an entry in a dictionary.

\item[key-value pair:] One of the items in a dictionary.

\item[method:] A kind of function that is called with a different syntax and
invoked ``on'' an object.

\item[invoke:] To call a method.

\item[hint:] Temporary storage of a precomputed value to avoid redundant
computation.

\item[overflow:] A numerical result that is too large to be represented
in a numerical format.

\index{dictionary}
\index{key}
\index{key-value pair}
\index{hint}
\index{method}
\index{invoke}

\end{description}
